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Abstract 

Adaptive networks consist of a collection of agents with adaptation and learning abilities. The 
agents interact with each other on a local level and diffuse information across the network through 
their collaborations. In this work, we consider two types of agents: informed agents and uninformed 
agents. The former receive new data regularly and perform consultation and in-network tasks, while the 
latter do not collect data and only participate in the consultation tasks. We examine the performance 
of adaptive networks as a function of the proportion of informed agents and their distribution in space. 
The results reveal some interesting and surprising trade-offs between convergence rate and mean-square 
performance. In particular, among other results, it is shown that the performance of adaptive networks 
does not necessarily improve with a larger proportion of informed agents. Instead, it is established that 

m : 

the larger the proportion of informed agents is, the faster the convergence rate of the network becomes 
albeit at the expense of some deterioration in mean-square performance. The results further establish that 

o 

uninformed agents play an important role in determining the steady-state performance of the network, and 
that it is preferable to keep some of the highly connected agents uninformed. The arguments reveal an 
important interplay among three factors: the number and distribution of informed agents in the network, 
the convergence rate of the learning process, and the estimation accuracy in steady-state. Expressions 



that quantify these relations are derived, and simulations are included to support the theoretical findings. 
We further apply the results to two models that are widely used to represent behavior over complex 
networks, namely, the Erdos-Renyi and scale-free models. 
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I. Introduction 

Adaptive networks consist of a collection of spatially distributed nodes that are linked together through 
a connection topology and that cooperate with each other through local interactions. Adaptive networks 
are well-suited to perform decentralized information processing and inference tasks 0, Q and to model 
complex and self-organized behavior encountered in biological systems 0, Q, such as fish joining 
together in schools [6] and birds flying in formation d. 

In previous works on adaptive networks ||2], J3], E), and in other related studies on distributed and 
combination algorithms |[8l- |[2Ti . the agents are usually assumed to be homogeneous in that they all have 
similar processing capabilities and are able to have continuous access to information and measurements. 
However, it is generally observed in nature that the behavior of a biological network is often driven more 
heavily by a small fraction of the agents as happens, for example, with bees and fish |[22ll - |[24ll . This 
phenomenon motivates us to study in this paper adaptive networks where only a fraction of the nodes are 
assumed to be informed, while the remaining nodes are uninformed. Informed nodes collect data regularly 
and perform in-network processing tasks, while uninformed nodes only participate in consultation tasks 
in the manner explained in the sequel. 

We shall examine how the transient and steady-state behavior of the network are dependent on its 
topology and on the proportion of the informed nodes and their distribution in space. The results will 
reveal some interesting and surprising trade-offs between convergence rate and mean-square performance. 
In particular, among other results, the analysis will show that the performance of adaptive networks does 
not necessarily improve with a larger proportion of informed nodes. Instead, it is discovered that the larger 
the proportion of informed nodes is, the faster the convergence rate of the network becomes albeit at the 
expense of some deterioration in mean-square performance. The results also establish that uninformed 
nodes play an important role in determining the steady-state performance of the network, and that it is 
preferable to maintain some of the highly connected nodes uninformed. The analysis in the paper reveals 
the important interplay that exists among three factors: the number of informed nodes in a network, 
the convergence rate of the learning process, and the estimation accuracy. We shall further apply the 
results to two topology models that are widely used in the complex network literature ||25l , namely, the 
Erdos-Renyi and scale-free models. 



To establish the aforementioned results, a detailed mean-square-error analysis of the network behavior 
is pursued. However, the difficulty of the analysis is compounded by the fact that nodes interact with 
each other and, therefore, they influence each other's learning process and performance. Nevertheless, 
for sufficiently small step-sizes, we will be able to derive an expression for the network's mean-square 
deviation (MSD). By examining this expression, we will establish that the MSD is influenced by the 
eigen-structure of two matrices: the covariance matrix representing the data statistical profile and the 
combination matrix representing the network topology. We then study the eigen-structure of these matrices 
and derive useful approximate expressions for their eigenvalues and eigenvectors. The expressions are 
subsequently used to reveal that the network MSD can be decomposed into two components. We study 
the behavior of each component as a function of the proportion of informed nodes; both components show 
important differences in their behavior. When the components are added together, a picture emerges that 
shows how the performance of the network depends on the proportion of informed nodes in an manner 
that supports analytically the popular wisdom that more information is not necessarily better ll26ll . 

The organization of the paper is as follows. In Sections II and III, we review the diffusion adaptation 
strategy and establish conditions for the mean and mean-square stability of the networks in the presence of 
uninformed nodes. In Section IV, we introduce two popular models from the complex network literature. 
In Section V, we analyze in some detail the structure of the mean-square performance of the networks and 
reveal the effect of the network topology and node distribution on learning and adaptation. Simulation 
results appear in Section V in support of the theoretical findings. 

II. Diffusion Adaptation Strategy 

Consider a collection of N nodes distributed over a domain in space. Two nodes are said to be 
neighbors if they can share information. The set of neighbors of node k, including k itself, is called 
the neighborhood of k and is denoted by A/&. The nodes would like to estimate some unknown column 
vector, w°, of size M. At every time instant, i, each node k is able to observe realizations {dk(i),Uk,i} of 
a scalar random process dfc(z) and a 1 x M vector random process Uk ti with a positive-definite covariance 
matrix, R u ^ = E«£ { Uk : i > 0, where E denotes the expectation operator. All vectors in our treatment 
are column vectors with the exception of the regression vector, Uk,i, which is taken to be a row vector 
for convenience of presentation. The random processes {dk(i), Uk,i} are assumed to be related to w° via 
a linear regression model of the form ||27l : 

d k (i) = u k ,iW° +v k (i) (1) 



where Vk(i) is measurement noise with variance o 1 vk and assumed to be spatially and temporally 
independent with 

Evi(i)v l (j)=al k -5 kl -5 ij (2) 

in terms of the Kronecker delta function. The noise Vk{%) is assumed to be independent of u\ j for all 
/ and j. The regression data it& $ is likewise assumed to be spatially and temporally independent. All 
random processes are assumed to be zero mean. Note that we use boldface letters to denote random 
quantities and normal letters to denote their realizations or deterministic quantities. Models of the form 
CD)-© are useful in capturing many situations of interest, such as estimating the parameters of some 
underlying physical phenomenon, or tracking a moving target by a collection of nodes, or estimating the 
location of a nutrient source or predator in biological networks (see, e.g., 0, 0). 

The objective of the network is to estimate w° in a distributed manner through an online learning 
process, where each node is allowed to interact only with its neighbors. The nodes estimate w° by 
seeking to minimize the following global cost function: 

N 

J^V)4£E|d fc (i)-u M H 2 (3) 

fe=i 

Several diffusion adaptation schemes for solving © in a distributed manner were proposed in [f2], ||3], 
||28lk the latter reference considers more general cost functions. It was shown in these references, through 
a constructive stochastic approximation and incremental argument, that the structure of a near-optimal 
distributed solution for © takes the form of the Adapt-then-Combine (ATC) strategy of [3 1; this strategy 
can be shown to outperform other strategies in terms of mean- square performance including consensus- 
based strategies ||29l , ll30l . Hence, we focus in this work on ATC updates. The ATC strategy operates as 
follows. We select an N x N left-stochastic matrix A with nonnegative entries {a^k > 0} satisfying: 

A T t = 1 and a hk = if, and only if, / £ M k (4) 

where 1 is a vector of size N with all entries equal to one. The entry a^ k denotes the weight on the link 
connecting node I to node k, as shown in Fig. [Q Thus, condition (0]) states that the weights on all links 
arriving at node k add up to one. Moreover, if two nodes I and k are linked, then their corresponding 
entry a^ k is positive; otherwise, a; ^ is zero. The ATC strategy consists of two steps. The first step ( f5ab 
involves local adaptation, where node k uses its own data {dk(i),Uk,i}- This step updates the weight 
estimate at node k from u>ki-i to an intermediate value tp k ,i- The second step d5bl is a combination 
(consultation) step where the intermediate estimates {ipi : i} from the neighborhood are combined through 




^p Informed node ^ Uninformed node 

Fig. 1. A connected network with informed and uninformed nodes. The weight ai t k scales the data transmitted from node I 
to node k over the edge linking them. 



the weights {ai^} to obtain the updated weight estimate w k i. The ATC strategy is described as follows: 

' Vv = w k ,i-i + n k u* k:i [d k (i) - u k ,iW k ,i-i] (5a) 

Wk,i = ^2 % k ipi,i (5b) 

leM k 

where fi k is the positive step-size used by node k. To model uninformed nodes over the network, we 
shall set fj, k = if node k is uninformed. We assume that the network contains at least one informed 
node. In this model, uninformed nodes do not collect data {d k {i), u k ,i} and, therefore, do not perform 
the adaptation step ( f5ab ; they, however, continue to perform the combination or consultation step (|5bl . 
In this way, informed nodes have access to data and participate in the adaptation and consultation steps, 
whereas uninformed nodes play an auxiliary role through their participation in the consultation step only. 
This participation is nevertheless important because it helps diffuse information across the network. One 
of the main contributions of this work is to examine how the proportion of informed nodes, and how 
the spatial distribution of these informed nodes, influence the learning and adaptation abilities of the 
network in terms of its convergence rate and mean-square performance. It will follow from the analysis 
that uninformed nodes also play an important role in determining the network performance. 

III. Network Mean-Square Performance 

The mean-square performance of ATC networks has been studied in detail in [3] for the case where 
all nodes are informed. Expressions for the network performance, and conditions for its mean-square 
stability, were derived there by applying energy conservation arguments ||27l , ||3T1 . In this section, we 
start by showing how to extend the results to the case in which only a fraction of the nodes are informed. 
The condition for mean-square stability will need to be properly adjusted as explained below in (fl"3l and 
(fT4l ). We start by examining mean stability. 



A. Mean Stability 

Let the error vector for any node k be denoted by: 

Wk,i = w°- w Ki (6) 

We collect all weight error vectors and step-sizes across the network into a block vector and block matrix: 

■ibi = Go\{w\ ti , ■ ■ ■ ,WN,i} , M - diag{mI M , • • ■ ,^Nhi} (7) 

where the notation col{-} denotes the vector that is obtained by stacking its arguments on top of each 
other, and diag{-} constructs a diagonal matrix from its arguments. We also introduce the extended 
combination matrix: 

A = A ® I M (8) 



where the symbol ® denotes the Kronecker product of two matrices. Then, starting from (r5a])-(l5bl and 
using model (fl}, some algebra will show that the global error vector in © evolves according to the 
recursion: 

' ~ ; ' (9) 



Wi = A 1 (I N m - MHi)Wi-i -A 1 Ms 



where 

TZi = diagju^iii^, • • • , u* N)i UN,i}, Si = co\{u* li v 1)i , • • • , u* N)i VN,i} (10) 

Since the regressors {uk,i} are spatially and temporally independent, then the {■Ufcj} are independent 
of Wi-i. Taking expectation of both sides of ©, we find that the mean relation for ibi evolves in time 
according to the recursion: 



Eibi = B ■ Ewj_i (11) 



where we introduced the block matrices: 

B = A T {I NM - MR), TZ^En i = diag{i? u ,i, • • • , R U , N } (12) 

In the following statement, we provide conditions to ensure mean stability of the network, namely, that 
Kibi — > as i — > oo, even in the presence of uninformed nodes. 

Theorem 1 (Mean stability). The ATC network (0) with at least one informed node converges in the 
mean sense if the step-sizes {nk} and the combination matrix A satisfy the following two conditions: 
1) For every informed node I, its step-size \xi satisfies: 

< in ■ p(R u>l ) < 2 (13) 



where the notation p(-) denotes the spectral radius of its matrix argument. 
2) There exists a finite integer j such that for every node k, there exists an informed node I satisfying: 

[A%>0 (14) 

That is, the (I, k)th entry of A 3 is positive. [This condition essentially ensures that there is a path 
from node I to node k in j steps.] 

Proof: We first introduce a block matrix norm. Let £ be an N x N block matrix with blocks of 
size M x M each. Its block matrix norm, IIEIL, is defined as: 



' N \ 



' S " 6 ~l™< X 7v' ^ '"^H 2 I <15! 



where T,k,i denotes the (k,l)th block of £ and || • H2 denotes the largest singular value of its matrix 
argument. That is, we first compute the 2-induced norm of each block E& j and then find the oo-norm 
of the N x N matrix formed from the entries {||£fe,/||2}. It can l> e verified that (031 ) satisfies the four 
conditions for a matrix norm [32|. To prove mean stability of the ATC network ©, we need to show 
that conditions (fT3T)-(fT4l) guarantee p(B) < 1, or equivalently, p{B 3 ) < 1 for any j. Now, note that 

/ N 



p(B ] ) < \\B j \\ b = max V \\\B j ] 



Kk<N 
\l=l 



(16) 



By the rules of matrix multiplication, the (k, l)th block (of size M x M) of the matrix £> J is given by: 

N N N 

[ B %i=Y. E-" E Bk, mi B mu m 2 ---B mj _ ul (17) 

■m 1 =lm 2 =l rraj_i = l 

where Bk,i is the (k, /)th block (of size M x M) of the matrix B from (fT2l and is given by 

Bk,i = ai,k ■ {hi - ViRu,i) (18) 

Then, using the triangle inequality and the submultiplicative property of norms, the 2-induced norm of 
[B j ]k,i in (0/7]) is bounded by: 

N N N 

Mfc,4-£ E'" S \\ B k,m 1 h-\\B mi ,m 2 h---\\B mj - 1 ,lh (19) 

mi=lm 2 =l mj_i=l 

Note that in the case where I G J\f m , we obtain from condition (fT3T ) and expression (fT8l i that 



l^m.ilb = 0,l,m ■ P(hl — PlRud) < 



< a[ m , if node I is informed 

(20) 

= ai m , if noc le / is uninformed 



where we replaced the 2-induced norm with the spectral radius because covariance matrices are Hermitian. 
Relation (l20l ) and condition (0)) imply that 



N N N 

< * 
2 



i BJ ]k,i 9 ^Yl E " X S,fS,mr"fli ft -. (21) 

m,i=lm,2=l m,j_i=l 

Strict inequality holds in (|2TT i if, and only if, the sequence (I, rrij-i, . . . , mi, k) forms a path from node 
I to node /s using j edges and there exists at least one informed node along the path. Since we know 
from condition ([Pil l that there is an informed node, say, node 1°, such that a path with j edges exists 
from node 1° to node k, we then get from (fT6l ) and (|2TI ) that 



- - \ j#« 

N / N N N 

i=l \-m 1= l-m 2 =l rra,_i = l 



mj-i 



(22) 



JV 

max 

Kk<N 



E M 



/.A- 



2=1 
= 1 

where the last equality is from condition © because (yl T ) J l = 1 if ^4 r l = 1. ■ 

Condition (fl4l ) is satisfied if the matrix A is primitive [32|. Since, by (0]), A is left-stochastic, it 
follows from the Perron-Frobenius Theorem [32 1 that the eigen-structure of A satisfies certain prominent 
properties, which will be useful in the sequel, namely, that (a) A has an eigenvalue at A = 1; (b) the 
eigenvalue at A = 1 has multiplicity one; (c) all the entries of the right and left eigenvectors associated 
with A = 1 are positive; and (d) p(A) = 1 so that all other eigenvalues of A have magnitude strictly 
less than one. We remark that since in this paper we will be dealing with connected networks (where a 
path always exists between any two arbitrary nodes), then condition (fT4l) is automatically satisfied. As 
such, the ATC strategy (f5]l will converge in the mean whenever there exists at least one informed node 
with its step-size satisfying condition (fT3T ). In the next section, we show that conditions (fT3l)-([T4l) further 
guarantee mean-square convergence of the network when the step-sizes are sufficiently small. 

B. Mean-Square Stability 

The network mean-square-deviation (MSD) is used to assess how well the network estimates the weight 

vector, w°. The MSD is defined as follows: 

1 - 
MSD= lim — y^E\\w ki \\ 2 (23) 



k=l 



where || • || denotes the Euclidean norm for vectors. To arrive at an expression for the MSD, we first derive 
a variance relation for the ATC network; the variance relation indicates how the weighted mean-square 
error vector evolves over time [27]. Let £ denote an arbitrary nonnegative-definite Hermitian matrix that 
we are free to choose, and let a = vec(S) denote the vector that is obtained by stacking the columns 
of £ on top of each other. We shall interchangeably use the notation \\x\\j, and \\x\\^ to denote the 
same weighted square quantity, x*T>x. Following the energy conservation approach of ||27| . ||3TI . we can 
motivate the following weighted variance relation: 

EUt&illl = E (Wwi-iW^n^AZA^u-MKt)) +^A T MSMA) (24) 

where 

S = Es.s* = diagfcr^-Ru,!, . . . , <j 2 vN R u , n } (25) 

Relation (l24l) can be derived from © directly by multiplying both sides from the left by w*T, and taking 
expectations. Some algebra will then show that for sufficiently small step-sizes, expression (l24l can be 
approximated and rewritten as (see [fJ) for similar details, where terms that depend on higher-order powers 
of the small step-sizes are ignored): 



E\\w i \\l=E\\w^ 1 \\ 2 Fa + [vec(^ T )] T a 



(26) 



where 

(27) 



F = B T ®B*, y = A T MSMA 



Relation d26l ) is very useful and it can be used to study the transient behavior of the ATC network, as 
well as its steady-state performance. The following result ensures that E||«?j||^ remains bounded and 
converges to some constant as i goes to infinity. 

Theorem 2 (Mean-square stability). The ATC network (0 with at least one informed node is mean-square 
stable if the step-sizes {//&} and the combination matrix A satisfy conditions M3\)-M4\). and the step-sizes 
{^k} ar ? sufficiently small such that higher-order powers of them can be ignored. 

Proof: Expression (l26l ) holds for sufficiently small step-sizes. As shown in 0, the mean-square 
convergence of d26l ) is guaranteed if p (F) < 1. But since 

P {F)= P {B T ®B*) = [p(B)f (28) 

and conditions (fT3l)-(fT4l guarantee p{B) < 1 from Theorem 1, it also holds that p{F) < 1. ■ 
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C. Mean-Square Performance 

Now, assume the network is mean-square stable and let the time index i tend to infinity. From (|26T i, 
we obtain the steady-state relation 



lim E||tt7,-||? r 



-T)a 



[vec{y 7 



a 



(29) 



Since the eigenvalues of the matrix F are within the unit disc, the matrix (Ijv 2 m 2 — F) is invertible. Thus, 
the network MSD, as given by d23l ). can be obtained by choosing a = (In 2 m 2 — F)~ vcc(Inm)/N, 
which leads to the following useful expression 



MSD = — [vec(y T )] (I mM * - J")- 1 vec(I NM ) 



(30) 



Expression (l30l relates the network MSD to the quantities {y,F} defined by d27l ). These quantities 
contain information about the data statistical profile, the spatial distribution of informed nodes, and the 
network topology through their dependence on {1Z,M,A}. Using the following equalities for arbitrary 
matrices {U, W, £} of compatible dimensions: 

vec(lTEW) = (W T ® U)a, Tr(SW) = [vec( W T )] T a 

and the fact that, for any stable matrix F, it holds: 

{In 2 m 2 



(31) 



j-)- 1 = Y J J ri 

i=o 
we can obtain an alternative expression for the network MSD from (1271 ) and (|30l , namely, 



(32) 




(33) 

This expression for the MSD will be the starting point for our analysis further ahead, when we examine 
the influence of the proportion of informed nodes on network performance. 



D. Convergence Rate 

We denote the convergence rate of the ATC strategy © by r so that the smaller the value of r is, 
the faster the rate of convergence of E||wj|| 2 is towards its steady-state value. As indicated by (1261 ). the 
convergence rate is determined by the spectral radius of the matrix F in (T27T ). i.e., 



p(F) = [p(B)Y 



(34) 



Let Mi denote the set of informed nodes, i.e., k G Mi if node k is informed. From now on, we introduce 
the assumption below, which essentially assumes that all informed nodes have similar processing abilities 



11 



in that they use the same step-size value while observing processes arising from the same statistical 
distribution. 

Assumption 1. Assume that p k = p for all informed nodes and that the covariance matrices across all 
nodes are also uniform, i.e., R u ,k = Ru- We continue to assume that the step-size is sufficiently small so 
that it holds that < \i ■ p(R u ) < 1- 

Then, we have the following useful result. 

Lemma 1 (Faster convergence rate). Consider two configurations of the same network: one with A//,i in- 
formed nodes and another with Nip informed nodes. Let r\ and r<i denote the corresponding convergence 
rates for these two informed configurations. If Nip 15 Nil, then ri < r±. 

Proof: Under Assumption 1, we have that 

Im — ^Ru, if node I is informed 
Im, if node I is uninformed 



Im — HiR u ,i 



(35) 



(36) 



Then, the matrix [B^]k.i in (fTTT ) can be written as: 

N N N 

[^L,( = X X '" X a m 1 ,k-a r n 2 ,rn 1 --- a l,m^ 1 - (hi - ^Ru) gi ' k 
mi=lrri2=l m,j_i = l 

where the exponent qi^ denotes the number of informed nodes along the path (I, rrij-\, . . . , mi, k). 
Note that [B^]k t i is a nonnegative-definite matrix because (Im — fJ-Ru) > in view of the condition 
< pp(R u ) < 1. In fact, all eigenvalues of (Im — fJ>Ru) lie within the line segment (0, 1). Moreover, 
since A//,i Q Nip, we have that q\l < q\l and, therefore, the matrix difference 

N N N 



B (1)J 



k.l 



g(2)i 



k,l 



/ j / j ' ' ' / j Q"mi,k ' Q'm,2,mi ' ' ' "I,mj_i 
m 1 =lm 2 =l mj_i=l (37) 

x [(/ - pR u )^ -(I- pR u )^ 
is a nonnegative-definite matrix, where the superscripts denote the indices of the informed configurations, 
AA/,i or Nip. Since [B^ j ] k ,i, [B^ j ] k j, and [B^ j ] k ,i- [B i2)j ] k ,i are all nonnegative-definite, then it must 
hold that 

[fiWij > \B^ j ] (38) 

Relation d38l ) can be established by contradiction. Suppose that d38T ) does not hold, i.e., p([B^ l ^] ky i) < 
p([B^i]k,i) as [B^ijkj and [B^]k,i are Hermitian from d36l . In addition, let x denote the eigenvector 



12 



that is associated with the largest eigenvalue of [B^]^, i.e., ([B^]k,i)x = p{[B^ 2 ^]k,i)x. Then, we 
obtain the following contradiction to the nonnegative-definiteness of [B( l 'J]k,i — [B }k,V- 



g(i)i 



k,i 



g(2)i 



k.l 



X = X 



g(i)j 



k,i 



x — p 



g(2)i 



A-,/ 



x*x < 



(39) 



by the Rayleigh-Ritz Theorem [32]. By the definition of the block matrix norm in (fl"5T l. we arrive at 



g(i)i 



i/j 



> 



g(2)j 






(40) 



for all j. Let j tend to infinity and we obtain that 

P (bW)> P (bV) (41) 

where we used the fact that p(B) = limj_> 00 (||£P||) 1// - ? for any matrix norm ||32l . ■ 

The result of Lemma [Q shows that if we enlarge the set of informed nodes, the convergence rate 
decreases and convergence becomes faster. The following result provides bounds for the convergence 
rate. 



Lemma 2 (Bound on convergence rate). The convergence rate is bounded by 

[1 - (i ■ Xm(Ru)] 2 < r < 1 
where Xm(Ru) denotes the smallest eigenvalue of R u . 



(42) 



Proof: Since the ATC network is mean-square stable, i.e., p(B) < 1, the upper bound is obvious. 
On the other hand, from Lemma [Q the value of p(B) achieves its minimum value when all nodes are 
informed, i.e., the matrix M in (0 becomes M = p-Inm- In this case, the matrix B in (fT2l can be 
written as: 

B° = A T <g> (I M - pR u ) 



(43) 



where the superscript is used to denote the matrix B when all nodes are informed. Then, 

P(B) > p{B°) 

= P {A T ) ■ P {I M - pR u ) 
We already know that p{A T ) = 1. In addition, because (Im — pRu) > 0, we have that 

p(hi - pRu) = 1 - p ■ Xm(Ru) 
and we arrive at the lower bound in (l42l. 



(44) 



(45) 
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IV. Two Network Topology Models 

Before examining the effect of informed nodes on network performance, we pause to introduce two 
popular models that are widely used in the study of complex networks. We shall call upon these models 
later to illustrate the theoretical findings of the article. For both models, we let n k denote the degree 
(number of neighbors) of node k. Note that since node A; is a neighbor of itself, we have n k > 1. In 
addition, we assume the network topology is symmetric so that if node I is a neighbor of node k, then 
node k is also a neighbor of node I. 

A. Erdos-Renyi Model 

In the Erdos-Renyi model [33], there is a single parameter called edge probability and is denoted by 
p£ [0,1]. The edge probability specifies the probability that two distinct nodes are connected. In this 
way, the degree distribution of any node k becomes a random variable and is distributed according to a 
binomial distribution, i.e., 

f{n k ) = i N ~ 1 I p^-\l-p) N -^ (46) 

\n k - l) 

The expected degree for node k, denoted by n k , is then 

n k = (N - l)p + 1 (47) 

Note that, in this model, all nodes have the same expected degree since the right-hand side is independent 
of k. Therefore, the expected network degree, fj, becomes 

1 N 
fj±-Y,n k = (N-l)p + l (48) 

fc=l 

B. Scale-Free Model 

The Erdos-Renyi model does not capture several prominent features of real networks such as the small- 
world phenomenon and the power-law degree distribution J25l . The small- world phenomenon refers to 
the fact that the number of edges between two arbitrary nodes is small on average. The power-law degree 
distribution refers to the fact that the number of nodes with degree n k falls off as an inverse power of 
n k , namely, 

f(n k ) ~ cnT 1 (49) 



with two positive constants c and 7. Networks with degree distributions of the form (149) are called 
scale-free networks IJ341 and can be generated using preferential attachment models. We briefly describe 
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the model proposed by ||3"5"I . The model starts with a small connected network with Nq nodes. At every 
iteration, we add a new node, which will connect to m < Nq distinct nodes besides itself. The probability 
of connecting to a node is proportional to its degree. As time evolves, nodes with higher degree are more 
likely to be connected to new nodes. Eventually, there are a few nodes that connect to most of the 
network. This phenomenon is observed in real networks, such as the Internet II251 . If N S> iVo, the 
expected degree of the network approximates to 

fj « 2m + 1 (50) 

because every new arrival node contributes 2m + 1 degrees to the network. 

V. Effect of Topology and Node Distribution 

We are now ready to examine in some detail the effect of network topology and node distribution on 
the behavior of the network MSD given by (l33l and the convergence rate given by (l34l i. 

A. Eigen-structure of B 

To begin with, we observe from (l33l and (l34l i that the network MSD and convergence rate depend 
on the matrix B from ([L2l in a non-trivial manner. To gain insight into the network performance, we 
need to examine closely the eigen-structure of B, which is related to the combination matrix A and the 
covariance matrix R u . We start from the eigen-structure of A. To facilitate the analysis, we assume that 
A is diagonalizable, i.e., there exists an invertible matrix, U, and a diagonal matrix, A, such that 

A T = UMJ- 1 (51) 

Now, let rk and Sk (k = 1, ...,N) denote an arbitrary pair of right and left eigenvectors of A T 
corresponding to the eigenvalue \k{A). Then, 



U 



ri ■■■ r N 



U- L =col{sl,...,s* N }, A = diag{\ 1 (A),...,\ N (A)} (52) 



Obviously, it holds that s*rk = 5m since UU l = In. We further assume that the right eigenvectors of 
A T satisfy: 

kjVfcl < ||r fc || 2 (53) 

for I 7^ k. Condition (|53l states that the {r^} are approximately orthogonal (see example below). Without 
loss of generality, we order the eigenvalues of A T in decreasing order and assume 1 = \\(A) > | A2 (^4.) | > 



15 



• • • > | An (A) | . The eigen-decomposition of A T can also be written as: 



N 



A T = ^2\ k (A).r k sl 



k=i 



(54) 



Note that any symmetric combination matrix satisfies both conditions (1511) and (1531 ) since then rfr k = Ski- 
Another example of a useful combination matrix A that is not symmetric but still satisfies (I5TT ) is the 
uniform combination matrix, i.e., 

1/rik, ifleATk 

0, otherwise 



a l,k 



(55) 



Lemma 3 (Diagonalization of uniform combination matrix). For a connected and symmetric network 
graph, the matrix A defined by 051 ) is diagonalizable and has real eigenvalues. 

Proof: We introduce the degree matrix, D, and the adjacency matrix, C, of the network graph, whose 
entries are defined as follows: 



D = diag{m,..., njv}, [C] k ,i=< 



(56) 



(57) 



1, if I G A4 
0, otherwise 
Then, it is straightforward to verify that the matrix A T in d55l ) can be written as: 

A T = D' X C 

which shows that A T is similar to the real-valued matrix A s defined by: 

A s 4 D 1 / 2 A T D- 1 / 2 
= D-WCD-W 
where D 1 / 2 = diagjy^ni", . . . , y^njy}- Since the topology is assumed to be symmetric, the matrix C is 



(58) 



symmetric, and so is A s . Therefore, there exists an orthogonal matrix, U s , and a diagonal matrix with 
real diagonal entries, A, such that 

(59) 



A s = U s kUj 



From (1581 ), we let 



u = d-^ 2 u„ u- 1 = uTd 1 / 2 



(60) 



and we obtain (pTl) . ■ 

Note that since the matrices U s and D 1 / 2 in (l60l are real-valued, so are eigenvectors of the uniform 

combination matrix, {r^, s k }- Furthermore, from (l60l . we can express {r k , Sk} in terms of the eigenvectors 
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of A s defined in (T58I ). Let ri denote the kth eigenvector of A s and let ri , denote the Zth entry of ri. 



Likewise, let {rk,uSk,i} denote the Zth entries of {rk,Sk}. Then, we have 



ru 



k,l 



Sk,l 



n l ■ r ki 



(61) 



For the Erdos-Renyi model, since nodes have on average the same expected degree given by (1471 ). i.e., 
rik ~ fik = fj, then the right eigenvectors {r&} of the uniform combination matrix defined by (l55l l are 
approximately orthogonal in view of 



I T I 
\ r l r k\ 



N 

£ 

m=l 



l.m k.m 



11... 



N 



7 j 1,171 k, 1 



m=l 



-Okl 

V 



(62) 



Approximation (1621 is particularly good when N is large since most nodes have degree similar to fj. 
Even though this approximation is not generally valid for the scale-free model, simulations further ahead 
indicate that the approximation still leads to good match between theory and practice. 

Remark 1. We note that for networks with random degree distributions, such as the Erdos-Renyi and 
scale-free networks of Sec. IV, the matrix A is generally a random matrix. In the sequel, we shall 
derive expressions for the convergence rate and network MSD for realizations of the network — see 
expressions (11221 ) and (11231 ) further ahead. To evaluate the expected convergence rate and network MSD 
over a probability distribution for the degrees (such as (1461 or (|49ll), we will need to average expressions 
(11221) and ( 11231 ) over the degree distribution. ■ 

For the covariance matrix R u , we let z m (m = 1, . . . , M) denote the eigenvector of R u that is associated 
with the eigenvalue \ m {R u ). Then, the eigen-decomposition of R u is given by: 

(63) 

where the {z m } are orfhonormal, i.e., z* n z m = S mn , and the {X m (R u )} are again arranged in decreasing 
order with Xi(R u ) > \2(R U ) > • • • > ^m(Ru) > 0. In the sequel, for any vector x, we use the notation 
Xkd to denote a sub-vector of x formed from the /rth up to the /th entries of x. Also, we let Nj denote 
the number of informed nodes in the network. Without loss of generality, we label the network nodes 
such that the first JVj nodes are informed, i.e., Mi = {1,2,..., Ni}. The next result establishes a useful 
approximation for the eigen-structure of the matrix B defined in (fT2l ; it shows how the eigenvectors and 
eigenvalues of B can be constructed from the eigenvalues and eigenvectors for {A T ,R U } given by (l54l ) 
and dED. 
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Lemma 4 (Eigen-structure of B). For a symmetric ATC network (O with at least one informed node, 
the matrix B = A T (I — M1Z) has approximate right and left eigenvector pairs {r\ m , s b k m } given by: 



k,m~ r k® z m , k = l,...,N; m = l,...,M 

Xu(A) r 

(1 — [i\m(R u )) ■ s kl:Ni (8) z m s k ^ Ni+1 . N (g> z n 



b * .^ Afc(A) 
fe ' m ~ A fc , m (B) 



where A& iTO (B) denotes the eigenvalue of the eigenvector pair {r km ,s km } and is approximated by: 

Afc, m (B) RJ X k (A) • [l - lx\ m {Ru) ■ S% )1 . Nl Tk t l:N t ] 

Proof: We first note from dD and (l54l that the matrix „4 T can be written as 

N 
A T = Y^ MA)(ri ® I M )(4 ® Tftf) 



(64) 
(65) 

7 

(66) 
(67) 



i=i 



Then, the matrix B in (fT2l) becomes 



A 



B = ^ A,(A)( n ® 7 M )(4 ® Im) I 



ATM 



A? 



hIn t m 







(N-Nj)M 



(I N ® i2«) 



(68) 



2=1 



5^Aj(i4)(rj®/M) af,i:iv> ® (Im - M-Ru) a? iiV > + i:jv ® J m 



Multiplying B by the r? defined in d64l from the right, we obtain 



A? 



S • r fc,m = XI A '(^) ' ( r/ ® /Af ) [ S *,l:JVi r fc»l:JV/ ® (1 ~ ^Ru)z m + sl Nl+1 . N r k ,Ni+l:N ® Z m ] 
1=1 

N 

= X A/(A) • [(1 - fi\ m (R u )) sl 1:Ni r kA . Nl + sl Ni+1 . N r kyNl+1:N ] (77 ® Jm)(1 <8> 2m) 
z=i 

Af 
= ^ A/ (A) • [sJV* - p,\ m (R u ) ■ S* jl:JVj rfc,l:JVr] • ( r / ® z m) 



(69) 



2=1 
= Afc(A) • [l - n\ m (R u ) ■ 4,l:Ar J r fc,l:A r /] • ( r fc ® Z m ) 
- flX m (R u ) ^ A* (.4) • 8* >1 . Ni rk,l:N I • (n 8) Zm) 

where we used that s^r k = 5^. For sufficiently small step-sizes, we can ignore the second term in the 
last equation of (l69l and write: 



B • r km « Afc(A) • [l - fl\ m (R u ) ■ 4,1:7^1^,1:%] • (rjfc <8> Z m ) 



(70) 



Afc, m (B) • r 



k,m 
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Note that approximation (1701 ) is particularly good for the uniform combination matrix in (I55l l since, from 
(l6Tb and by the Cauchy-Schwarz inequality, we have 



Ni 
\ v s s 

s l,l:N I r k,l:N I \ ~ / , r l,m r k,m 

m=l 

Following similar arguments, we can verify that 



< 



Nr 

m=l 



^k^.N^k^.NA 



^k,m\B) 



N 



E Xl ( A "> ' K rfc ~ ^m(Ru) ■ sl 1:Ni r kA:Nl ] 



1=1 



x ( 1 <g> zt 



[ S il:Nj ® ( J Af - M-flu) S *,JVi+l:iV ® 7 A/ 



Afc(v4) 



■ Afe m (B) • 



(1 H\m{Ru)j 8 k,l:N z ® Z m s k,Nr+l:N ® Z ri 



A fc , m (£) • s 6 * 



k,rn 



(71) 



(72) 



Now, we argue that the approximate eigenvalues of ,8 in (1661 ) have magnitude less than one, i.e., 
|Afc,m(£>)| < 1 for all k and m. Note that, since |Afc(A)| < 1 for k > 1 and for sufficiently small step-sizes, 
we have \\k,m{B)\ ~ |Afc(^4)| < 1 for fc > 1. For k = 1, Ai(A) = 1. However, since the eigenvectors 
{ri,si} have all positive entries, as we remarked before, we have < s* X . N n,i:Ar 7 < s*ri = 1. In 
addition, from Assumption 1 that < pp(R u ) < 1 and \ m {R u ) > for all m, we know that 

< 1 - Hp{Ru) < 1 " p\ m (Ru) ■ Sl,l:JV 7 n,l:JV> < 1 ( 73 ) 

and we conclude that |Ai jm (£>)| < 1 for all to. For the uninform combination matrix defined in (TSBl . 
since all eigenvectors and eigenvalues of A are real-valued, we further have that the {\k,m(B)} are reai - 



B. Simplifying the MSD Expression (1221) 

Using the result of Lemma 4, we find that the eigen-decomposition for the matrix B^ has the approx- 



imate form: 



N M 



& ~ 2-^ 2-^ \,jn\®) ' r k,m s k,m 



k=\ 771=1 



we can rewrite the network MSD d33l l in the form: 

oo N M 



N 

j=0 k,l=l rn,n=l 

N M 



MSD ^EE E Tf KmWrfnV 3 ) ■ rU^mH. 



b 3* 

l,n 



EE v 

k,l=lm,n=l iv " 



Ln k,m 



y*u 



k,m^ y l.n 



1 - X k ^{B)\* ln {B) 



(74) 



(75) 
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(76) 



Moreover, from (1641 and assumption (1531 . since 

b* b _ { * \ /o* /* * \ 
r l,n r k,m — VI r k) <S> \ z n z m) 

~ II Il2 r x 
~ iFfell • Ofci " ^mn 

expression d75T l simplifies to: 

N M II i|2 . „b* v „b 

msd -S£ ^[i-ia,„w] 

Expression (1771 ) can be simplified further once we evaluate the term in the numerator. We start by 
expressing the matrix y from (|27T l as: 

y = ZVT X Z* 



(77) 



where 



Z = „4 T .Mft 

n = diag{a~lRu, ..., (J- 2 N R U } = S" 1 ® R u 



with E„ = diaglcr^ 1; . . . , a% N }. Then, we get 



„&* -»; 6 _ II 6* 7Q-1/2||2 
b k,m-J /b k,m ~ \\ b k,m^' i! ' II 



Note from (I67li and 



that the matrix 2 in (1791) can be written as: 

TV 



Z = A T -B = J2 X i(A)(ri®lM) sl 1:Ni ®fiR u al Ni+1:N ®0 M 



i=i 



We then obtain from (|631l . (f79l , and ([82]> that: 



jV 



4m 2n 1/2 = Z1 A '^)- S fc!™( r '®^)- <l:JV 7 ®Mfl« <7V I+ 1:7V®0 M « ^ 



«=1 



A* (A) 



\ Km {B) ■ (I ® z* m ) 



Ak,m\B) 

= h(A) ■ 
Therefore, the term s^* m ys b k m in (I8TT ) becomes 



1/2 1/2 

s k,l:N I ^v,l:N I ® /^ 0i x(A r_ A r J ) M 



1/2 1/2 



„6* 



.6* 



s Lm^ s fc,m ~ S fe,m^' ) ( S k,m^^ 



(78) 

(79) 
(80) 

(81) 
(82) 



(83) 



(84) 



~M A TO (i?n)|Aft(^4)| • s^i^j^.iiJV/Sfc.iiiVj 
where we used that zj^2i m = 1. Then, substituting (l66l) and (l84l into (1771) . we arrive at the following 
expression for the network MSD in terms of the eigenvalues and eigenvectors of A T and the eigenvalues 
of -Ru. 
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Theorem 3 (Network MSD). The network MSD of the ATC strategy (0) can be approximately expressed 
as 

« , « , U Xm(Ri,)\Xb(A)\ ■ llrt-ll ■ S* l.nr Xi> T-M-rSb l-M, 

(85) 



M c D _ \^ V^ V 2;K rn(Ru)\>>k{A)\ 2 ■ \\r k \\ 2 ■ sl A . Nl T: v! i: Nl S k> i :Nl 

1 - |A fc (^)| 2 • 1 - nX m (R u ) • s* kA . Ni r k:1 . Nl 



k=lm=l N 



Since the matrix A has a single eigenvalue at Ai(^4) = 1, and its value is greater than the remaining 
eigenvalues, we can decompose the MSD in d85l) into two components. The first component is determined 
by \i(A), i.e., k = 1 in d85l ), and is denoted by MSDfc =1 . The second component is due to the contribution 
from the remaining eigenvalues of A, i.e., k > 1 in (l85l ). and is denoted by MSDfc >1 . Since Xi(A) = 1, 
and for sufficiently small step-sizes, we introduce the approximation for the denominator in 



r 



\Xl(A)\ • |1 - fl\ m (Ru) ■ Sl j l;JV x n,l:AT J I ~ 1 - 2flX m {Ru) ■ •Sl,l:7V f n,l:7V J 



Then, the term MSDfe =1 becomes 



Ni „2 „2 



MSD 



fc=i 



M^Hnll 2 Er=i<i4 



2N 



EiVr 
1=1 r u s u 



(86) 



(87) 



For the second part, MSD^ >1 , since |Afc(^4)| < 1 for k > 1, and for sufficiently small step-sizes, the 
denominator in (1851 ) can be approximated by: 

2 



1 



(88) 



\X k (A)\ z ■ |1 - ixX m (Ru) • sl 1:Ni r kil:Nl \ « 1 - |A fc (^)r 

Comparing to (l86l . we further ignore the term 2^X m (R u )\X k (A)\ 2 • s£ VN rk,i-.Ni i n d88l) since this term 
is generally much less than 1 — |Afc(A)| 2 , especially for well-connected networks, i.e., high value of fj 
(see d9Tb further ahead). Then, MSDfc>i becomes 



(89) 



As shown by (1851 ). (1871 ). and (1891 . the network MSD depends strongly on the eigenvalues and eigenvectors 
of the combination matrix A. In the next section, we examine more closely the eigen-structure of the 
uniform combination matrix A from (l55l l. In a subsequent section, we employ the results to assess how 
the MSD varies with the proportion of informed nodes — see expressions (11031 ) and (I1151 l further ahead. 
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C. MSD Expression for the Uniform Combination Matrix from ( 1551 ) 

C.l) Eigenvalues of A: We start by examining the eigenvalues of the uniform combination matrix A 
from (l55l l. We define the Laplacian matrix, L, of a network graph as: 

L = D-C (90) 

in terms of the D and C from (l56l ). Then, the normalized Laplacian matrix is defined as IT36ll : 

C^D- l ' 2 LD- l ' 2 = I-A s (91) 

where A s is the same matrix defined earlier in (158V From Lemma |3] we know that the matrices A and 
A s have the same eigenvalues and we conclude that 

A fc (£) = 1 - A fc (A) (92) 

In other words, the spectrum of ^4 is related to the spectrum of the normalized Laplacian matrix. There 
are useful results in the literature on the spectral properties of the Laplacian matrices for random graphs 
II361 - II391 , such as the graphs corresponding to the Erdos-Renyi and scale free models of Sec. IV. We 
shall use these results to infer properties about the spectral distribution of the corresponding combination 
matrices A that are defined by (l55l l. In particular, reference [36 1 gives an expression for the eigenvalue 
distribution of C for certain random graphs; this expression can be used to infer the eigenvalue distribution 
of A, as we now verify. First we note from © that one is an eigenvalue of A, i.e., p(A) = X\(A) = 1. 
In the following, we use the results of ll36l to characterize the remaining eigenvalues (namely, A& (A) for 
k > 1) of uniform combination matrix. 

Theorem 4 (Eigenvalue distribution of A). Let n^ denote the average degree of node k in a random 
graph. Let 



N 

A 



^4l> (93) 

fc=i 

denote the average degree of the graph. Then, for random graphs with expected degrees satisfying 

n min = min {n k } > y^ (94) 

l<fc<iV 

the density function, /(A), of the eigenvalues of A converges in probability, as N — > oo, to the semicircle 
law (see Fig. 12]), i.e., 



/(A) = 



A) 2 

R) 
0, otherwise 



^V 1 "^)' if^l-R'R] (95) 
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where 



Vv 



Moreover, if n m - m 3> y/rjlog (N), the second largest eigenvalue of A converges almost surely to 

\\ 2 {A)\=R 



(96) 



(97) 



Proof: See Thms. 5 and 6 in EH. ■ 

Simulations further ahead (see Fig. [2) show that expressions ( 1931 ) and d9"7T ) provide accurate approx- 
imations for the Erdos-Renyi and scale-free network models described in Section IV. In addition, for 
ergodic distributions, the value of fj in d93l will be close to its realization r\ for large N, where r\ is 
defined as 

(98) 

In the following, we determine an expression for |Afc(.A)| by using ( 1931 ). To do so, we let k denote the 
number of eigenvalues of A that are greater than some value y in magnitude for < y < R. Then, the 
value of k is given by: 

k = N ■ 




u 



I - / f(X)d\ 

-y 



(99) 



= N-g(y) 
where we denote the expression inside the brackets by g(y). Note that the integral f^ f(X)d\ in d99l ) 
computes the proportion of eigenvalues of A within the region [—y,y]. Then, the Mi eigenvalue of A 
can be approximated by evaluating the value of y in d99l ), i.e., 



From (1931 ) and using the change of variables X/R = sinO, we obtain that g(y) in 

2 



g(y) 



i 



• sm 



lfV 



2 y . I x (y^ 



(100) 
has the form: 

(101) 



it \RJ irRy \RJ 

In Fig. 12 we show the averaged distribution of |Afc(^4)| for Erdos-Renyi and scale-free models over 30 
experiments. We observe that for both network models, the theoretical results in (19"7T l and (llOOl i match 
well with simulations. 
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i 0.8944 



Erdos-Renyi (p = 0.02) 
Scale-free (m = 2) 

■♦-Theory (100) 

-^— Linear approx. (113) 




Eigenvalue, A 



Fig. 2. Density function (left) for the eigenvalues of A as given by i J95t for N — > oo, and averaged eigenvalues (right) of the 
combination matrix A defined by d55t over 30 experiments with rj — 5. The dashed line on the right represents theory from 
J lOOt and the dash-dot line represents linear approximation given further ahead by i jll2t . 



C.2) MSD Expression for k = 1: From (|87T ), MSDfc =1 depends on the eigenvectors {ri,si}. For the 
uniform combination matrix A in (155T ). it can be verified that the right eigenvector for A s defined in (158T ) 
corresponding to the eigenvalue one has the following form: 

1 



Then, from (loTI ) and (|1021 l, expression (|87T i becomes 



col{^/nT, . . . , ^7} 




(using uniform combination matrix (I55l l) 



(102) 



(103) 



Expression (11031 ) reveals several interesting properties. First, we observe that the term MSDfc = i does not 
depend on the matrix R u , which is also a property of the MSD expression for stand-alone adaptive filters 
(23. Second, expression (11031 1 is inversely proportional to the degree of the network realization, rj. That 
is, when the network is more connected (e.g., higher values of p and m in the Erdos-Renyi and scale-free 
models), the network will have lower MSDfc = i. Third, expression (1103l l depends on the distribution of 
informed nodes through its dependence on the degree and noise profile of the informed nodes. If the 
number of informed nodes increases by one, the value of MSDfc = i may increase or decrease (i.e., it does 
not necessarily decrease). This can be seen as follows. From (11031 ) we see that MSD^i will decrease 
(and, hence, improve) only if 



*Ni 9 „2 



Nj 9 



£&<,»? + <*, + A + i ETJi". 



,N, 



EiV; | 

l=1 ni + n Nl+1 



< 



v,k n l 



,Nj 



E-fvj 
l=l n l 



(104) 
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or, if the degree of the added node satisfies: 

spN t 2 2 
a v,N I+ l n N I+ l < N (105) 

C.3) MSD Expression for k > 1; For MSDfc >1; we apply relation (|6TT > and approximation (l62l . Then, 



expression ([89) can be approximated by: 

\JL r a?m) /^ „ „\1 

(106) 



' fc=2 



A ' U) ^i>>W 



l-A^(A) 

where we replaced fj by 77 for large iV. Expression (11061 ) requires knowledge of the eigenvectors {rf.} 
of A s in (|58]l. Note that f or k = 1 and from (11021 . we have 

<*w=*H <107) 

since the nodes have similar degree in the Erdos-Renyi model. We are therefore motivated to introduce 
the following approximation: 

(rid 2 « ^ (108) 

for all k. Observe that expression (|108l l is independent of k, and we find that expression (11061 ) simplifies 
to: 

Furthermore, from (1100) . we can approximate the summation over fe in dl09b by the following integral: 

v 4.W „ r [j-'h]' 



1 E- 

N ^ 1 



,, , - , 9 dx (110) 

,_ 2 - W A) 1-b- 1 ^)] 2 

where we replaced fe/JV by x. However, evaluating the integral in (II 101 ) is generally intractable. We 
observe though from the right plot in Fig. |2]that |A&(.A)| (and also g~ 1 (k/N)) decreases in a rather linear 
fashion for k > 1. Note that the function g{y) in dlOlb has values 1 at y = and at y = -R ~ 2/y/V- 
We therefore approximate #(y) by the linear function 

g(y)^l-^-y (111) 



Then, 



g~ 1 (x)^^-(l-x) (112) 

V*7 



and expression (1 1 10b becomes 

ly^_Ai(A) ^ yi 4/rHl-z) 2 



fc=2 



rdx 



^fel-A»(A) 7o 1 - 4/,/ • (1 - *)* (ipj 
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oH B H H H H — « 




Fig. 3. The function h(a) (left) from l |87t and the derivative of a 2 h{a)/A with respect to a (right). 



where the function h(a) is defined as 

h{a) = 



1 , fl + a 

7T log z 

la \ 1 — a 



(114) 



Substituting expression (11131 ) into (11091 ), we find that the MSD contributed by the remaining terms (k > 1) 
has the following form: 

\ / N ' \ / n \ 

(using uniform combination matrix (l55l l) (115) 




Note that, in contrast to MSD,t =1 in (1 103b . MSD/^ in rtii5D always increases when the number of 
informed nodes increases. Moreover, the function h(a), shown in Fig. [3] has the following property. 

Lemma 5. The function h(a) defined in A114J is strictly increasing and convex in a 6 (0, 1). 

Proof: From (11131 ). we note that h(a>) can be written in the integral form: 



h(a) = I ?r^dx 



1 — a l x 






(116) 



Taking the derivative of h(a) in (11161 ) with respect to a, we obtain: 



dh(a) 



lax 1 



7 dx > 



da Jq (1 — a 2 x 2 ) 2 

for a G (0, 1). To show convexity, we take the second derivative of h(a) for a G (0, 1) and find that 

d 2 h(a) _ f 1 2x 2 + Qa 2 x A 
da 2 Jo (1 — a 2 x 2 ) 3 



(117) 



-dx > 



(118) 



The result of Lemma [5] implies that when r\ (or, p or m) increases, MSDfc>i in (11151 ) decreases. That is, 
in a manner similar to MSDj, =1 in dl03l ), the value of MSD^^! is lower if the network is more connected. 
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In addition, we observe that when r\ is too low (or, a is too large in Fig. |3), the value of h(2/y/rj) will 
increase rapidly and so does the value of MSDfc>i. Note from (111 5b that MSD^i depends on 77 through 
the function h(2/^/rj)/r], or equivalently, a 2 h(a)/4 by replacing 2/^/rj with a. We show the derivative 
of a 2 h(a)/A with respect to a in the right plot of Fig. [3] It is seen that the derivative function increases 
rapidly beyond a = 0.8. To maintain acceptable levels of accuracy, it is preferable for the derivative to 
be bounded by a relative small value, say, 0.5. Then, the value of a should be less than 0.8, or rj > 6.25. 
That is, the average neighborhood sizes should be kept around 6-7 or larger. 

D. Convergence Rate Expression 

From (l66l . \\k,m(B)\ can be expressed as: 

|A fc , m (B)| = |A fc (A)| • |1 - p\ m {Ru) • s* k>1:Nl r k>1:Nl \ (119) 

Since |A&(.A)| < |Ai(A)| = 1 for k > 1, and for sufficiently small step-sizes, the maximum value of 
\\k,m{B)\ (namely, p(B)) occurs when k = 1. Recall that all entries of r\ and s\ are positive, which 
implies that |Ai m (jB)| increases as m increases (i.e. smaller X m (R u )). Then, we arrive at the following 
expression for p(B): 

p(B) = \\ h M(B)\ = 1 - p\ M (Ru) ■ sl 1:Ni r hl .. Nl ( ! 20) 



The square of this expression determines the rate of convergence of the ATC diffusion strategy ©. Note 
that expression (11201 i satisfies Lemmas Q] and |2] For the uniform A in d55l) . we obtain from doTb . (I102l i. 
and (fT20l that 



p {B) = l-p\ M {Ru)-^^ L 



(using uniform combination matrix (l55l l) (121) 



Expression (11211 ) can be motivated intuitively by noting that the decay of p(B) will be larger as informed 
nodes have higher degrees. Simulations further ahead show that expression (I1211 l matches well with 
simulated results. 

E. Behavior of the ATC Network 

Combining expressions (|1031 l, (11151 1, and (11211 1. we arrive at the following result for ATC diffusion 
networks. 

Theorem 5 (Network MSD under uniform combination weights). The ATC network (0) with uniform 
step-sizes and regression covariance matrices (p k = p and R u ^k = Ru) and with the uniform combination 
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matrix A in f !55D has approximate convergence rate: 

1 - p,X M (Ru) 



^ie^ n i \ (122 ) 



a«J approximate network MSD: 

w/iere 77 an J /i(-) are defined in ( IM1 ) a«J rtii4D . respectively. ■ 

Note that the summations in (11221 ) and dl23l ) are over the set of informed nodes, Mi. Expressions (11221 ) 
and (11231 ) reveal important information about the behavior of the network. First, the convergence rate in 
(11221 ) and the network MSD in (11231 ) depend on the network topology only through the node degrees, 
{n{\, and the network degree, 77. In general, the higher values of rj are, the slower the convergence 
rate is (an undesirable effect) and the lower the network MSD is (a desirable effect). Second, as the 
set of informed nodes, Mi, increases, we observe from (I122t that the faster the rate of convergence 
becomes (a desirable effect). However, as we will illustrate in simulations, the behavior of the terms 
MSDfc = i and MSDfc>i ends up causing the network MSD given by (11231 ) to increase (an undesirable 
effect) as Mi increases. Figure @] illustrates the general trend in the behavior of the network MSD and 
its components, MSDfc =i and MSD^ >1 . Two scenarios are shown in the figure corresponding to the case 
whether the added informed nodes satisfy (11051 ) or not. The figure shows that depending on condition 
(11051 ), the curve for MSDfc =1 can increase or decrease with Ni. Nevertheless, the overall network MSD 
generally increases (i.e., becomes worse) with increasing TV/. These facts reveal an important trade-off 
between the convergence rate and the network MSD in relation to the proportion of informed nodes. We 
summarize the behavior of the ATC network in Table I and show how the rate of convergence and the 
MSD respond when the parameters {rj, Ni,Tr(R u )} increase. We remark that slower convergence rate 
and worse estimation correspond to increasing values of r and MSD (an undesirable effect). 

For a proper evaluation of how the proportion of informed nodes influences network behavior, we 
shall adjust the step-size parameter such that the convergence rate remains fixed as the set of informed 
nodes is enlarged and then compare the resulting network MSDs. To do so, we set the step-size to the 
following normalized value: 

M = ^^ (124) 
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MSD fc=1 




,' MSD t=1 



Number of informed nodes, A7 



Number of informed nodes, Nt 



Fig. 4. Sketch of the behavior of the network MSD as a function of the number of informed nodes, Ni, depending on whether 
relation d!05t is satisfied (left) or not (right). 

TABLE I 
Behavior of the ATC Network in response to increases in any of the parameters {q, Ni, Tr(R u )} 





convergence rate r i 1221 


MSD fT23l) 


MSD fe= i <TT03T> 


MSD fe> i C2D 


Nil 


faster 


worse in general 


may be better or worse (see 11105)) 


worse 


vt 


slower 


better 


better 


better 


Tr(Ru) t 


faster 


worse 


independent of Tr(i? u ) 


worse 



for some fiQ > 0. Note that this choice normalizes ^0 by the sum of the degrees of the informed nodes. 
In this way, the convergence rate given by (I122t becomes 



Nrj 



(125) 



which is independent of the set of informed nodes. Moreover, the network MSD in (11231 ) becomes 



MSD 



M/i EieN, °Z,i n i , Mo Tr (^«) ,f 2 \ DieM a v,i n i 



+ 



h ^ 



(126) 



Using the same argument we used before in (1 1041 . if we increase the number of informed nodes by one, 
the first term in (11261 ) (namely, MSDfc = i) will increase if the degree of the added node satisfies: 



nNi+i > 2 



a l,N,+i pLieATj n l) 
Y^ieAfi a v,i n i 






(127) 



and the second term in (11261 ) (namely, MSD/01) will increase if the degree of the added node satisfies: 

a v,N I +i l^leMi n{ 



riNj+i < 



EieM a ll n l 



2 E^ 



leN, 



(128) 
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In the following, we show that there exist conditions under which both requirements (I1271 l and (I1281 l 
are satisfied. That is, when this happens and interestingly, the network MSD ends up increasing (an 
undesirable effect) when we add one more informed node in the network. In the first example, we assume 
that the degrees of all nodes are the same, i.e., set m = n for all /. Then, c\ and ci in (1127l) - (1128b become 

c 1 = 2(N I /3-l)-\ c 2 = f3-2 (129) 



where 



2 

a v,N I +l 



It can be verified that if 



P = 2 ,m ( 130 ) 

£j6A/i a v,l/ N I 

/?>2 + -^ (131) 

(or, if the noise variance at the added node is large enough), both dl27| ) and d 128b are satisfied and then 
the MSD will increase (i.e., become worse). A second example is obtained by setting the noise variances 
to a constant level, i.e., a 2 ul = a^, for all I. Then, c\ and C2 in (1127l )- (11281 ) become 

i2 



c\ 



-i 



c 2 = -1 (132) 



Y.ieNtrf 
In this case, the second term in (11261 ) always decreases, whereas the first term in (11261 1 will increase 

if the degree of the added informed node is high enough. However, as the number of informed nodes 

increases, the step-size in (11241 ) will become smaller and the first term in (11261 ) becomes dominant. As a 

result, the network MSD worsens if (11271 ) is satisfied, i.e., when the added node has large degree. These 

results suggest that it is beneficial to let few highly noisy or highly connected nodes remain uninformed 

and participate only in the consultation step A5b\) . 



VI. Simulation Results 

We consider networks with 200 nodes. The weight vector, w°, is a randomly generated 5x1 vector (i.e., 
M = 5). The regressor covariance matrix R u is a diagonal matrix with each diagonal entry uniformly 
generated from [0.8, 1.8], and noise variances are set to ^ j. = 0.01 for all k. The step-size for informed 
nodes is set to jj, = 0.01. Without loss of generality, we assume that the nodes are indexed in decreasing 
order of degree, i.e., n\ > ri2 > • • • > n^. 

We first verify theoretical expressions (l33l and (l34l for the network MSD and convergence rate. Figure 
[5] shows the MSD over time for two network models with parameters p = 0.02, m = 2, and Nq = 10. 
For each network model, we consider two cases: 200 or 50 (randomly selected) informed nodes. We 
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Fig. 5. Transient network MSD over the Erdos-Renyi (left) and scale-free (right) networks with 200 nodes. The dashed lines 
represent the theoretical results (1331 and (I341 . 



observe that when the number of informed nodes decreases, the convergence rate increases, as expected, 
but interestingly, the MSD decreases. The theoretical results are also depicted in Fig. [5] The MSD decays 
at rate r in (l34l) during the transient stage. When the MSD is lower than the steady-state MSD value from 
d33l ), the MSD stays constant at d33l ). We observe that the theoretical results match well with simulations. 
The theoretical results (l33l ) and (l34l l will be used to verify the effectiveness of the approximate expressions 
(TT221) and (fl23l . 

We examine the effect of the proportion and distribution of informed nodes on the convergence rate 
and MSD of the network. We increase the number of informed nodes from the node with the highest 
degree, i.e., from node 1 to node N. The convergence rate and MSD are shown in Fig. [6] For each 
model, we consider two possible values of parameters: p = 0.02 and 0.075 in the Erdos-Renyi model 
and the m = 2 and 8 in the scale-free model. Simulation results are averaged over 30 experiments. Note 
from (l48l ) and (1501 ) that the two models have similar network degree. As expected, the convergence rates 
decrease when we add more informed nodes and expression (1122b matches well with expression (J34J). 
In addition, the convergence rates in the scale-free model are lower in the beginning because there are 
some nodes with very high degrees. 

Interesting patterns are seen in the MSD behavior. We show MSDfc =1 from ( 11031 ) and MSD/ C>1 from 
( 11151 ) in Fig. [7] We observe from Fig. [7] that MSDfc =1 decreases, whereas MSD^ >1 increases with iV/. 
If two network models have similar degree, the scale-free model will have higher values of MSDfe = i 
and MSDfc >1 than the Erdos-Renyi model, and therefore higher values of MSD. This is because the 
scale-free model has higher values of n\. Since MSDfc =1 decreases and MSDjt>i increases, the resulting 
MSD in (1123b can either increase or decrease. The curve of MSD depends on the values of MSDfc = i 



31 



TABLE II 

Network degree and |A 2 (^4)| for two network models 





Erdos-Renyi (p) 


Scale-free (m) 


Parameter (p or m) 


0.02 


0.075 


2 


8 


V 


5.13 


15.83 


4.93 


16.33 


\\ 2 (A)\ 


0.883 


0.503 


0.900 


0.495 



p= 0.02, m= 2 



0.995 



O 0.985 



0.98 
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Fig. 6. Convergence rate (left) and steady-state MSD (right) for Erdos-Renyi and scale-free models with the addition of 
informed nodes in decreasing order of degree. The dashed lines represent approximate expressions (1 1221 and (11231 . 



and MSDfc>i. We observe from Fig. [6] that in most cases, the MSD decreases when Nj is small, and 
then increases with N[. As in the case of a stand-alone adaptive filter, there exists a trade-off between 
the convergence rate and the MSD. Interestingly, for the scale-free model with higher values of to, we 
see from Fig. [6] that the MSD decreases with Nj. We also see that the approximation for the MSD in 
(11231 ) matches well with expression (I33T ). 
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Fig. 7. MSDfc^i (left) and MSDfc>i (right) for Erdos-Renyi and scale-free models with the addition of informed nodes in 
decreasing order of degree. The dashed lines represent approximate expressions dl03t and dl 15b . 



expression (??) becomes 
MSD 

From the Cauchy-Schwarz inequality and for a fixed value of 77, we know that 



2 f^N 



l -z + fi 2 Tr(R u )a 2 v ■ h (^ 



(133) 



N 



N 



nf 



(134) 



2> ^-E 

with equality if, and only if, n\ = r\ for all I, i.e., all nodes have the same degree. Then, we obtain a 
lower bound for the MSD: 



MSD> 



M,u * + f*{iOo».h(-L) 



(135) 



Since the nodes in the Erdos-Renyi model have similar degree, it will achieve lower MSD than the 
scale-free model if all nodes are informed. 



A. MSD with Fixed Convergence Rate 

We vary the value of step-size as in (|124l with jjlq = 0.1 and show the network MSD over the number 
of informed nodes in Fig. [8] To show the MSD possibly increases with Nj, we reverse the order in adding 
informed nodes, i.e., from node N to node 1. It is interesting to note that for the scale-free model, the 
MSD increases when the number of informed nodes is large. This is because in the scale-free model, 
there are few nodes connected to most nodes in the network and condition (11271 ) is satisfied. The results 
suggest that in the scale-free model, we should let few highly connected nodes remain uninformed and 
perform only the consultation step (1551 . 
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Fig. 8. Steady-state MSD with the deployment for node N to node 1 for Erdos-Renyi and scale-free models. The dashed lines 
represent approximate expression ( 11261 . 



VII. Concluding Remarks 

In this paper, we derived useful expressions for the convergence rate and mean-square performance 
of adaptive networks. The analysis examines analytically how the convergence rate and mean-square 
performance of the network vary with the degrees of the nodes, with the network degree, and with the 
proportion of informed nodes. The results reveal interesting and surprising patterns of behavior. The 
analysis shows that there exists a trade-off between convergence rate and mean-square performance in 
terms of the proportion of informed nodes. It is not always the case that increasing the proportion of 
informed nodes is beneficial. 
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